An Approximate Lp-Difference Algorithm for Massive Data Streams
نویسندگان
چکیده
Several recent papers have shown how to approximate the difference ∑i |ai−bi| or ∑ |ai−bi| between two functions, when the function values ai and bi are given in a data stream, and their order is chosen by an adversary. These algorithms use little space (much less than would be needed to store the entire stream) and little time to process each item in the stream. They approximate with small relative error. Using different techniques, we show how to approximate the Lp-difference ∑i |ai− bi| for any rational-valued p ∈ (0,2], with comparable efficiency and error. We also show how to approximate ∑i |ai−bi| for larger values of p but with a worse error guarantee. Our results fill in gaps left by recent work, by providing an algorithm that is precisely tunable for the application at hand.
منابع مشابه
An Approximate L1-Difference Algorithm for Massive Data Streams
We give a space-efficient, one-pass algorithm for approximating the L1 difference Pi jai bij between two functions, when the function values ai and bi are given as data streams, and their order is chosen by an adversary. Our main technical innovation is a method of constructing families fVjg of limited-independence random variables that are range-summable, by which we mean that Pc 1 j=0 Vj(s) i...
متن کاملAn Approximate L-Difference Algorithm for Massive Data Streams
Massive data sets are increasingly important in a wide range of applications, including observational sciences, product marketing, and monitoring and operations of large systems. In network operations, raw data typically arrive in streams, and decisions must be made by algorithms that make one pass over each stream, throw much of the raw data away, and produce “synopses” or “sketches” for furth...
متن کاملAn Approximate Lp-Di erence Algorithm for Massive Data Streams
Several recent papers have shown how to approximate the diierence P i jai ? bi j or P jai ? bi j 2 between two functions, when the function values ai and bi are given in a data stream, and their order is chosen by an adversary. These algorithms use little space (much less than would be needed to store the entire stream) and little time to process each item in the stream and approximate with sma...
متن کاملStreaming Algorithms for Distributed, Massive Data Sets
Massive data sets are increasingly important in a wide range of applications, including observational sciences, product marketing, and monitoring and operations of large systems. In network operations, raw data typically arrive in streams, and decisions must be made by algorithms that make one pass over each stream, throw much of the raw data away, and produce \synopses" or \sketches" for furth...
متن کاملFast Mining of Massive Tabular Data via Approximate Distance Computations
Tabular data abound in many data stores: traditional relational databases store tables, and new applications also generate massive tabular datasets. For example, consider the geographic distribution of cell phone traffic at different base stations across the country or the evolution of traffic at Internet routers over time . Detecting similarity patterns in such data sets (e.g., which geographi...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Discrete Mathematics & Theoretical Computer Science
دوره 4 شماره
صفحات -
تاریخ انتشار 2000